演算法1-最大概似估計法原理及應用

林嶔 (Lin, Chin)

Lesson 17

最大概似估計法介紹(1)

– 讓我們先複習一下linear regression的求解公式推導過程:

  1. 先定義我們的預測式(y = b0 + b1 * x)

  2. 再定義我們的求解目標(希望讓殘差平方最小化)

  3. 偏微分求解殘差平方方程(了解極小值出現的位置)

最大概似估計法介紹(2)

– 先考慮一個比較簡單的問題,假定台北該疾病盛行率為1/3,台中盛行率為1/2,高雄盛行率為2/3,請問我們這次的抽樣最有可能是在哪個城市呢?

– 當我們思考到這步時,試著算出在台北抽出49個有病及31個沒病的機率吧(請用二項分布計算,但注意二項分布的基本假設):

Taipei.p = choose(80, 49) * (1/3)^49 * (1 - 1/3)^31
print(Taipei.p)
## [1] 2.078886e-07

– 接著算出在台中及高雄抽樣的樣本機率吧吧:

Taichung.p = choose(80, 49) * (1/2)^49 * (1 - 1/2)^31
print(Taichung.p)
## [1] 0.0118359
Kaohsiung.p = choose(80, 49) * (2/3)^49 * (1 - 2/3)^31
print(Kaohsiung.p)
## [1] 0.05449674

最大概似估計法介紹(3)

sample.p = function (p) {
  choose(80, 49) * (p)^49 * (1 - p)^31
}
sample.p(0.1)
## [1] 5.459073e-29
sample.p(0.3)
## [1] 5.402339e-09
sample.p(0.5)
## [1] 0.0118359
sample.p(0.7)
## [1] 0.02270722
sample.p(0.9)
## [1] 8.193775e-12

– 有了這樣的概念後,該怎樣求解呢?

最大概似估計法介紹(4)

seq.p = seq(0, 1, by = 0.001)
result = numeric(length(seq.p))
for (i in 1:length(seq.p)) {
  result[i] = sample.p(seq.p[i])
}

which.max(result)
## [1] 613
seq.p[which.max(result)]
## [1] 0.612
plot(seq.p, result, type = "l", xlab = "母群參數", ylab = "樣本機率")

最大概似估計法介紹(5)

F17_1

– 我們需要使用套件「stats4」內的函數「mle」(但她只能求最小值出現的位置,不能求最大值,所以要改寫我們的sample.p函數):

– 請注意,函數「mle」並非使用數學上的微分求解,他使用的方式我們會在下一節課再詳細介紹!

library(stats4)
sample.p = function (p) {
  -choose(80, 49) * (p)^49 * (1 - p)^31
}
fit = mle(sample.p, start = list(p = 0.5), method = "SANN")
fit
## 
## Call:
## mle(minuslogl = sample.p, start = list(p = 0.5), method = "SANN")
## 
## Coefficients:
##         p 
## 0.6124266
49/80
## [1] 0.6125

練習-1

x = c(1, 7, 5, 6, 8, 3, 2, 9, 4, 5, 3)

– 你可能不清楚該怎麼求得常態分佈的機率,可以試著使用函數「dnorm」:

dnorm(0, mean = 0, sd = 1)
## [1] 0.3989423
dnorm(0, mean = 0, sd = 2)
## [1] 0.1994711
dnorm(0, mean = 1, sd = 2)
## [1] 0.1760327
dnorm(1, mean = 0, sd = 2)
## [1] 0.1760327

– 請與直接計算做比較

mean(x)
## [1] 4.818182
sd(x)
## [1] 2.522625

以最大概似估計法求解線性迴歸(1)

  1. 先定義我們的預測式(y = b0 + b1 * x)

  2. 再定義我們的求解目標(希望讓樣本機率最大化)

– 這裡要注意一點,因為抽到每個個案的機率相當低,之後還要累乘會變得更小,可能會小到電腦無法紀錄,因此把它做對數轉換能有效解決這個問題!

– 另外,本來機率是累乘,現在取完對數後變成累加!

x = c(1, 2, 3, 4, 5)
y = c(6, 7, 9, 8, 10)

linear.p = function(b0, b1) {
  y.hat = b0 + b1 * x
  res = y - y.hat
  mean.res = 0
  sd.res = sd(res)
  log_p.res = dnorm(res, mean = mean.res, sd = sd.res, log = TRUE)
  return(-sum(log_p.res))
}

fit1 = mle(linear.p, start = list(b0 = 0, b1 = 0), method = "SANN")
fit1
## 
## Call:
## mle(minuslogl = linear.p, start = list(b0 = 0, b1 = 0), method = "SANN")
## 
## Coefficients:
##        b0        b1 
## 5.3000118 0.8987946

以最大概似估計法求解線性迴歸(2)

fit1
## 
## Call:
## mle(minuslogl = linear.p, start = list(b0 = 0, b1 = 0), method = "SANN")
## 
## Coefficients:
##        b0        b1 
## 5.3000118 0.8987946
fit2 = lm(y~x)
fit2
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##         5.3          0.9

以最大概似估計法求解線性迴歸(3)

vcov(fit2)
##             (Intercept)           x
## (Intercept)   0.6966667 -0.19000000
## x            -0.1900000  0.06333333
slotNames(fit1)
## [1] "call"      "coef"      "fullcoef"  "vcov"      "min"       "details"  
## [7] "minuslogl" "nobs"      "method"
slot(fit1, "vcov")
##            b0          b1
## b0  0.4370205 -0.11400892
## b1 -0.1140089  0.03800375
fit1@vcov
##            b0          b1
## b0  0.4370205 -0.11400892
## b1 -0.1140089  0.03800375

以最大概似估計法求解線性迴歸(4)

x = rnorm(1000)
y = 3 + 2 * x + rnorm(1000)

fit1 = mle(linear.p, start = list(b0 = 0, b1 = 0), method = "SANN")
fit2 = lm(y~x)

fit1@coef
##       b0       b1 
## 3.034672 1.981693
fit2$coefficients
## (Intercept)           x 
##    3.036488    1.983512
fit1@vcov
##               b0            b1
## b0  9.528576e-04 -7.396710e-06
## b1 -7.396710e-06  1.004551e-03
vcov(fit2)
##               (Intercept)             x
## (Intercept)  9.538094e-04 -7.418096e-06
## x           -7.418096e-06  1.006548e-03

練習-2

F17_2

– 可以轉換成這樣:

F17_3

x = 0:10
y = c(0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1)

b0 = -3
b1 = 0.5
p = exp(b0 + b1 * x)/(1 + exp(b0 + b1 * x))
p
##  [1] 0.04742587 0.07585818 0.11920292 0.18242552 0.26894142 0.37754067
##  [7] 0.50000000 0.62245933 0.73105858 0.81757448 0.88079708

– 請與函數「glm」比較結果:

fit3 = glm(y~x, family = "binomial")
fit3
## 
## Call:  glm(formula = y ~ x, family = "binomial")
## 
## Coefficients:
## (Intercept)            x  
##     -1.4719       0.3409  
## 
## Degrees of Freedom: 10 Total (i.e. Null);  9 Residual
## Null Deviance:       15.16 
## Residual Deviance: 12.64     AIC: 16.64

自訂關係式(1)

  1. 先定義我們的預測式

  2. 再定義我們的求解目標

– 請到這裡下載範例資料。

dat = read.csv("data/Patient2.csv")
head(dat)
##            t        Y
## 1 0.00000000 232.9187
## 2 0.08333333 234.6244
## 3 0.16666667 234.1379
## 4 0.25000000 233.6271
## 5 0.33333333 232.8027
## 6 0.41666667 231.9975

– 根據我們的醫學及物理學知識,我們了解到時間與放射物質殘留量的關係應如下所示:

F17_4

D_1m是預測值

D_1m(0)是初始值

T是體內半衰期

T2是物理半衰期(固定為175.2)

k是體內/體外放射碘的代謝比例

t是時間

自訂關係式(2)

F17_4

predict.func = function (D_1m.0, T1, T2 = 175.2, k, t) {
  D_1m = D_1m.0*((1-k)*exp(-log(2)*t/T1)+k*exp(-log(2)*t/T2))
  return(D_1m)
}

predict.func(D_1m.0 = 250, T1 = 3, k = 0.7, t = 0)
## [1] 250
predict.func(D_1m.0 = 250, T1 = 3, k = 0.7, t = 1)
## [1] 233.8366
predict.func(D_1m.0 = 250, T1 = 3, k = 0.7, t = 2)
## [1] 220.8678
X = dat$t[1:350]
Y = dat$Y[1:350]

prob.sample = function(D_1m.0, T1, k) {
  pred.Y = predict.func(D_1m.0 = D_1m.0, T1 = T1, k = k, t = X)
  res = log(Y) - log(pred.Y)
  mean.res = 0
  sd.res = sd(res)
  log_p.res = dnorm(res, mean = mean.res, sd = sd.res, log = TRUE)
  return(-sum(log_p.res, na.rm = TRUE))
}

prob.sample(D_1m.0 = 250, T1 = 5, k = 0.1)
## [1] 575.2135
prob.sample(D_1m.0 = 200, T1 = 3, k = 0.2)
## [1] 1147.613

自訂關係式(3)

fit = mle(prob.sample, start = list(D_1m.0 = 250, T1 = 10, k = 0.1), method = "SANN")
fit
## 
## Call:
## mle(minuslogl = prob.sample, start = list(D_1m.0 = 250, T1 = 10, 
##     k = 0.1), method = "SANN")
## 
## Coefficients:
##      D_1m.0          T1           k 
## 243.9800863   9.7114614   0.1224558
NEW.X = dat$t[351:459]
NEW.Y = dat$Y[351:459]

pred.Y = predict.func(D_1m.0 = fit@coef[1],
                      T1 = fit@coef[2],
                      T2 = 175.2,
                      k = fit@coef[3],
                      t = NEW.X)

sum(log(NEW.Y) - log(pred.Y))^2
## [1] 124.9729
pred.Y_1 = predict.func(D_1m.0 = fit@coef[1],
                      T1 = fit@coef[2],
                      T2 = 175.2,
                      k = fit@coef[3],
                      t = seq(0, 40, by = 0.1))

plot(dat$t, dat$Y, pch = 19, xlab = "time", ylab = "Y")
lines(seq(0, 40, by = 0.1), pred.Y_1, col = "red")

自訂關係式(4)

log.Y = log(Y)

fit_log.lm = lm(log.Y~X)
pred.Y_log = fit_log.lm$coefficients[1] + NEW.X * fit_log.lm$coefficients[2]

sum(log(NEW.Y) - pred.Y_log)^2
## [1] 1.791982

– 畫張圖來看看

pred.Y_2 = exp(fit_log.lm$coefficients[1] + seq(0, 40, by = 0.1) * fit_log.lm$coefficients[2])

plot(dat$t, dat$Y, pch = 19, xlab = "time", ylab = "Y")
lines(seq(0, 40, by = 0.1), pred.Y_1, col = "red")
lines(seq(0, 40, by = 0.1), pred.Y_2, col = "blue")

小結

  1. 關係式未必越複雜越好,事實上是越「符合事實」越好,若沒有辦法透徹了解其中關係,有時候以簡馭繁是個不錯的選擇

  2. 函數「mle」在最後一個範例中,各位跑出來的結果是否都有差異呢?試著改變起始值你發現 了什麼?

F17_5